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Abstract 

values are experimental measures of the effects of mutations on the folding kine- 
tics of a protein. A central question is which structural information ^-values contain 
about the transition state of folding. Traditionally, a <£-value is interpreted as the 
'nativeness' of a mutated residue in the transition state. However, this interpreta- 
tion is often problematic because it assumes a linear relation between the nativeness 
of the residue and its free-energy contribution. We present here a better structural 
interpretation of ^-values for mutations within a given helix. Our interpretation is 
based on a simple physical model that distinguishes between secondary and ter- 
tiary free-energy contributions of helical residues. Prom a linear fit of our model 
to the experimental data, we obtain two structural parameters: the extent of helix 
formation in the transition state, and the nativeness of tertiary interactions in the 
transition state. We apply our model to all proteins with well-characterized helices 
for which more than 10 <3?-values are available: protein A, CI2, and protein L. The 
model captures nonclassical <3?-values < or > 1 in these helices, and explains how 
different mutations at a given site can lead to different $-values. 



Introduction 

There has been much interest in understanding the rates of protein fold- 
ing in terms of transition state structures. We focus here on two-state pro- 
teins, i.e. those proteins that fold with single-exponential kinetics. The fold- 
ing kinetics of two-state proteins is often investigated by mutational analysis 
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]. The effect of a 



given mutation on the protein's folding kinetics is quantified by its $-value 
[25,26] 

_ RT\n(k wt /k mut ) 

~ AG N U 
Here, k wt is the folding rate for the wildtype protein, k mut is the folding rate 
for the mutant protein, and AGn is the change of the protein stability in- 
duced by the mutation. The stability G^ of a protein is the free energy dif- 
ference between the denatured state D and the native state N. In classical 
transition-state theory, the folding rate of a two-state protein is proportional 
to exp[— Gt/RT], where Gt is the free energy difference from the denatured 
state to the transition state. 1 In that notation, $-values have the form 

AG T 

where each A in this expression represents the change due to the mutation. 

By definition, $-values are energetic quantities, related to changes in the pro- 
tein's stability and folding rate. Do $-values also give information about the 
structures that the protein adopts when it is in a kinetic "bottleneck" or tran- 
sition state [26,27,28,29,30]? In the traditional interpretation, $- values are 
taken to indicate the degree of structure formation of the mutated residue in 
the transition- state ensemble T. A $-value of 1 is interpreted to indicate that 
the residue is fully native-like structured in T, since the mutation shifts the 
free energy of the transition state T by the same amount as the free energy of 
the native state N. A $-value of is interpreted to indicate that the residue is 
as unstructured in T as in the denatured state D, since the mutation does not 
shift the free energy difference between these two states. $-values between 
and 1 are taken to indicate partial native-like structure in T. 

Modelers often calculate ^-values based on this traditional interpretation. 
In many approaches, $-values are calculated from the fraction of contacts a 
residue forms in the transition state T, compared to the fraction of contacts in 
the native and the denatured state [31,32,33,34,35,36,37,38,39,40,41,42,43,44], 
or from similar structural parameters [45,46]. Notable exceptions are a recent 
MD study of an ultrafast mini-protein in which values are calculated from 
rates for the wildtype and mutants via eq. (1) [47], and the calculation of 
$-values from free energy shifts of the transition-state ensemble using eq. (2) 
[48]. 



1 In principle, the prefactor of this proportionality relation could also depend on 
the mutation, but this dependence is usually neglected. 
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However, there are reasons to question this simple interpretation of $-values. 
First, some $- values are negative or larger than 1 [49,50]. These 'nonclassical' 
$-values cannot be interpreted as a degree of structure formation, because 
this would have the nonsensical implication of 'less structured than D' or 
'more structured than N\ Second, $-values are sometimes significantly differ- 
ent for different mutations at a given chain position, contradicting the normal 
assumption that the degree of nativeness of the transition state is just a prop- 
erty of the position of a monomer in the protein. Third, ^-values for neigh- 
boring residues within a given secondary structure often span a wide range 
of $-values. In the traditional interpretation, this means that some of the he- 
lical residues are unstructured in the transition state, while other residues, 
often direct neighbors, are highly structured. This contradicts the notion that 
secondary structures are cooperative. 

The inconsistencies of the traditional interpretation result from the assump- 
tion that the mutation-induced free energy changes of a residue are propor- 
tional to a single structural parameter, the 'degree of nativeness' of this residue 
in the transition state T. Is there a consistent structural interpretation of $- 
values, and if yes, how many structural parameters do we need to capture the 
mutation- induced free energy changes? We show here that the ^-values for 
multiple mutations in a given helix can be consistently interpreted in a simple 
physical model that takes into account just two structural parameters for the 
whole helix: Xa, the degree of secondary structure formation of the helix in the 
transition- state ensemble T, and X t the degree of tertiary structure formation 
of the helix in T. In our model, the mutation-induced free energy changes are 
split into two components. The overall stability change AGn is split into two 
parts: the change in intrinsic helix stability AG a , and the change in tertiary 
free energy AGt caused by the mutation. Similarly, AGt, the change of the 
free energy difference between the transition state and the denatured state, is 
split into a change XaAG a in secondary free energy, and a change XtAG t in 
tertiary free energy. The $-values for the mutations in the helix then have the 
general form 

AG N Xt + {Xa Xt) W 

The second expression simply results from replacing AG t by AGn — AG a . 
The two parameters Xa and Xt °f our model are 'collective' structural pa- 
rameters for all mutations in the helix. Different ^-values then simply result 
from different free-energetic 'signatures' AG a and AGn of the mutations. In 
particular, eq. (3) captures that different mutations of the same residue can 
lead to different $- values, and that $-values can be 'nonclassical', i.e. < or 
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> 1. Since the two structural parameters Xa and Xt range between and 1, a 
nonclassical $-value implies that the changes AG a and AGt in secondary and 
tertiary free energy caused by the mutation have opposite signs. 

To apply our model, we first estimate AG a , the change in helical stability, for 
each mutation in a particular helix, using standard helix propensity methods. 
We then plot all experimental values for <£> versus AG a / AGn, and obtain the 
two structural parameters Xa and Xt from a linear fit of eq. (3). In principle, the 
two structural parameters can be extracted if $-values and stability changes 
for at least two mutations in a helix are available. However, to test our model, 
and to obtain reliable values for Xa and Xt, we focus here on helices for which 
more than 10 $-values have been determined. The modeling quality then can 
be assessed from the standard deviation of the data points from the regression 
line, and from the Pearson correlation coefficients between <3> and AG a / AG N . 
Our model can be applied to all mutations for a helix, or to a subset of 
mutations that affect only the tertiary interactions with one other structural 
element. 



Models and methods 

Transition- state conformations and folding rate 

We model the transition state as an ensemble of M different conformations 
(see Fig. 1). Each transition-state conformation is directly connected to the 
native state N and to the denatured state D. The model thus has M parallel 
folding and unfolding routes. 

We assume that the protein is stable, i.e. that Gjy < 0. We also assume that 
the free energy barrier for each transition state conformation is significantly 
larger than the thermal energy, i.e. that G m /RT ^> 1 [51,52]. The rate of 
folding along each route m is then proportional to exp[— G m /RT], and the 
total folding rate as the sum over all the parallel routes is 

M 

k = cJ2 e~ G ^ RT (4) 

m=l 

where c is a constant prefactor. 2 

2 This model is a generalization of our previous model [53] with M = 2 transition- 
state conformations. The master equation that describes the folding kinetics of this 
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Decomposition of free energy changes for helical mutations 



Consider all mutations i = 1,2,... within one particular a-helix of a protein. 
The effect of these mutations on the stability and folding kinetics can be 
experimentally characterized by the stability changes AG N , and by the $- 
values. We suppose that the experimentally measured change in stability AGn 
for each mutation is the sum of effects on the stability of the helix and on the 
interactions of the helix with its tertiary neighbors: 

AG N = AG a + AG t (5) 

The first term, AG a , is the change in the intrinsic helix stability. The second 
term, AGt, is the change in tertiary free energy of the helix interactions with 
neighboring structures. Below, we estimate AG a using either the program 
AGADIR [54,55,56] or from a helix propensity scale [57]. The term AG t is 
then simply obtained by subtracting AG a from the experimentally measured 
stability change, AGn- 

We also decompose each AG m , the mutation-induced free energy change for 
the transition state conformation m, into two terms: 

AG m = s m AG a + t m AG t (6) 

Here, s m is either or 1, depending on whether the helix is formed or not in 
the transition state conformation m. The coefficient t m is between and 1 and 
represents the degree of tertiary structure formation in conformation m. 



Structural and energetic components of ^-values 



The folding rate for the mutant protein i is k nmt = k(Gi+AG±, G2+AG2, . . . ,Gm+ 
AGm) with k given in eq. (4). The folding rate of the wildtype is k wt = 
k(Gi, G2, • • • , Gm)- We assume here that the mutations do not affect the pre- 
factor c in eq. (4). For small values |AG m | of the mutation-induced free-energy 
changes, a Taylor expansion of In A; mut gives 



M Qi L iv- \n p -G m /RT 



model can be solved exactly. Eq. (4) is obtained from the exact solution in the limit 
of large transition state barriers G m [53] . 



5 



With the decomposition of the AG m 's in eq. (6), we obtain 



In k mut - In k^t ~ -— ( X aAG a + * t AG t ) (8) 



with the two terms 

V c p—G m /RT v c.p—Gm/RT 

y = ^ Am6 and y, = ^ m ^ e (Q) 

Xa ~ y p—Gm/RT dIla Xt- y p—Gm/RT ■ W 
Z-mi ° Z-^rra ° 

The term \a represents the Boltzmann-weighted average of the secondary 
structure parameter s m for the transition-state ensemble T. Xa ranges from 
to 1 and indicates the average degree of structure formation for the helix 
in T. The value Xa — 1 indicates that the helix is formed in all transition- 
state conformations m, and Xa = indicates that the helix is formed in none 
of the transition-state conformations. Values of Xa between and 1 indicate 
that the helix is formed in some of the transition-state conformation, and not 
formed in others. The term Xt represents the Boltzmann-weighted average of 
the tertiary structure parameter t m in T, and also ranges from to 1. From 
eq. (8) and the definition in eq. (1), we then obtain the general form (3) of 
the ^-values for helical mutations in our model. 3 



More than twenty two-state proteins with a/ft [1,2,3,4,5,6,7,8,9,10,11,12], en- 
helical [13,14,15,16], or all-/? structures [17,18,19,20,21,22,23,24] have been in- 
vestigated by mutational analysis in the past few years. Mutational data are 
also available for several proteins that fold via intermediates [58,59,60] or ap- 
parent intermediates [61]. We focus here on the well-characterized a-helices 
of two-state proteins for which at least 10 $-values apiece are available: the 
helices 2 and 3 from the protein A, and the helices of CI2 and protein L. 
Protein A is an a-helical protein with three helices, CI2 and protein L are 
a//5-proteins with a single a-helix packed against a /5-sheet. 



In principle, our parameter \t for the tertiary interactions can also be seen to 
depend on the residue position. To derive eq. (3), we don't have to assume that the 
tertiary parameters t m for the m transition-state conformations are independent of 
the residue position and/or mutation. However, we focus here on the simplest version 
of our model and show that a consistent structural interpretation of experimental 
^-values in a helix can be obtained with just two structural parameters Xa and Xt 
for the whole helix, which implies a cooperativity of secondary as well as tertiary 
interactions. 
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Results and discussion 



Our analysis of experimental $-values requires an estimate of the mutation- 
induced changes AG a of the intrinsic helix stability. In the case of the CI2 
helix, we estimate AG a both with the program AGADIR [54,55,56] and from 
a helix propensity scale [57], see Table 1. The change in intrinsic helix sta- 
bility AG a can be estimated from the helical content predicted by AGADIR 
via AG a = RT In {P^/P^ ut ). Here, is the helical content of the wildtype 
helix, and P™ ut the helical content of the mutant. The program AGADIR is 
based on helix/coil transition theory, with parameters fitted to data from Cir- 
cular Dichroism (CD) spectroscopy. In Table 1, the values for AG a obtained 
from AGADIR are compared to values from a helix propensity scale [57]. Helix 
propensities of the amino acids are typically given as free energies differences 
with respect to Alanine. We use the propensity scale of Pace and Scholtz [57], 
which has been obtained from experimental data on 11 different helical sys- 
tems. For example, the value AG a = 0.29 kcal/mol for the mutant E15D in the 
CI2 helix is simply the difference between the helix propensity 0.69 kcal/mol 
for the amino acid D (Aspartic acid) and the propensity 0.40 kcal/mol for 
amino acid E (Glutamic acid). The helix propensity scale can be applied for 
residues at 'inner' positions' of a helix, not for residues at the termini or 'caps' 
of the helix. The N-terminal residues of the CI2 helix are the residues 12 and 
13, the C-terminal residues are the residues 23 and 24. For the 8 mutations 
at 'inner positions' of the CI2 helix, the values for AG a from AGADIR and 
from the helix propensity scale correlate with a Pearson correlation coefficient 
of 0.77. For the other three helices considered here, the helicities predicted by 
AGADIR are significantly smaller than the helicities around 5 % predicted for 
the CI2 helix. Estimates for AG a based on AGADIR therefore are not reliable 
for these helices. The values of AG a shown in the Tables 2 to 4 are calculated 
from helix propensities. 

The three structural elements of protein A are its three helices. Based on 
the contact map of protein A shown in Fig. 2, the mutations in helix 2 of 
protein A can be divided into three groups: 'purely secondary' mutations that 
don't affect tertiary contacts; mutations that affect only tertiary contacts with 
helix 1; and mutations that affect tertiary contacts both with helix 1 and 3. 
If only the first two groups of mutations are considered in our analysis, Xt 
represents the average degree of structure formation with helix 1. If all groups 
and, thus, all mutations are considered, Xt is the average degree of structure 
formation with the helices 1 and 3. In the case of helix 3, we distinguish 
between mutations that affect either tertiary contacts with helix 1 or helix 2, 
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or none of the tertiary interactions, see Table 3. In the case of the protein L 
helix, the two other structural elements are the terminal /3-hairpins, see Fig. 3 
and Table 4. In the case of CI2, we do not distinguish between different tertiary 
contacts. One reason is that there are at least three other structural elements 
to consider, the three strand pairings P2P3, /^Ai, and /?i/?4 of the four-stranded 
/3-sheet that is packed against the CI2 helix [53]. Another reason is that the 
degree Xt of tertiary structure formation in the transition state is small for 
this helix. 

The structural parameters Xa and Xt obtained from our analysis shown in 
the Figs. 4 to 7 are summarized in Table 5. We estimate the overall errors 
of Xa and Xt, which result from experimental errors in $ and AGjv and from 
modeling errors, as ±0.05 for the CI2 helix and helix 2 of protein A, and as 
±0.1 for helix 3 of protein A and the protein L helix. The Xa values for the 
CI2 helix and the helix 2 of protein A are close to 1. This indicates that the 
helices are fully formed in the transition-state ensemble. In contrast, Xa for 
helix 3 of protein A is close to 0, indicating that the helix is not formed in 
the transition state. Xa for the helix in protein L indicates a partial degree of 
helix formation between 20 and 30 %. Our xt values indicate that the degree 
of tertiary structure formation in the transition state is around 16 % for the 
CI2 helix, around 50 % for helix 2 of protein A, and around 30 % for helix 3 of 
protein A. The Xt values for the protein L helix show a small degree of tertiary 
structure formation with hairpin 1 (around 15 %) and no tertiary structure 
formation with hairpin 2. 

To assess the quality of our modeling, we consider two quantities: the correla- 
tion coefficient r, and the estimated standard deviation SD of the data points 
from the regression line. High correlation coefficients up to 0.9 and larger in- 
dicate a high quality of modeling. However, it's important to note that the 
correlation coefficient can only be used to assess the modeling quality in the 
cases where the structural parameters Xa and Xt are sufficiently different from 
each other. The case Xa = Xt corresponds to a regression line with slope 0, 
and hence a correlation coefficient of 0, irrespective of how well the data are 
represented by this line. For small differences of Xa and xt, the correlation 
coefficient r is dominated by the experimental errors in This is the case for 
the mutations in the protein L helix that affect tertiary contacts with hairpin 
1, see Table 5. The slope of the regression line is almost zero for this data set, 
see Fig. 7. Here, the relatively small standard deviation 0.1 of the data points 
from the regression lines indicates that our model is in good agreement with 
the experimental data. 
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We only consider here mutations with stability changes AGn > 0.7 kcal/mol. 
Because of experimental errors, $-values for mutations with smaller stability 
changes are generally considered as unreliable [63,24,64]. In our previous work 
[53], we considered all the published mutations for the CI2 helix, including 
those for which AGjy is significantly smaller than 0.7 kcal/mol. The correlation 
coefficient 0.91 obtained here for the subset of mutations with AGn > 0.7 
kcal/mol is larger than the correlation coefficient 0.85 for all mutations. The 
significantly larger reliability threshold of 1.7 kcal/mol for AG N obtained by 
Sanchez and Kiefhaber [62] is based on the assumption that different mutations 
at the same residue position should lead to the same <3>-value. In our model, 
different $-values for mutations at the same site result from different effects 
on the intrinsic helix stability G a and the tertiary free energy G t = G^ — G a . 

In our model, nonclassical $- values < or > 1 can arise if AG a / AG N is < or 
> 1. Since AG N = AG a + AG t , this implies that AG a and AG t have opposite 
signs. Our model reproduces the clearly negative ^-values for the mutations 
D23A in the CI2 helix and D38A in the protein L helix. Both mutations 
stabilize the helix (i.e. AG a < 0), but destabilize tertiary interactions (AG t > 
0). 4 



Conclusions 

We have shown how to obtain a structural interpretation of $- values for multi- 
ple mutations within a single helix. Combined with any scale of helical propen- 
sities, our model shows how linear fitting of experimental data leads to two 
structural quantities: the extent of helix formation in the transition state, and 
the extent to which the helix interactions with neighboring tertiary structure 
are formed in the transition state. The method gives a simple physical in- 
terpretation of nonclassical $-values - nonclassical values arise if a mutation 
stabilizes a helix while destabilizing its interactions with neighboring parts of 
the protein, or vice versa. The model also explains how two different mutations 
at the same site can have different effects on the kinetics - this difference is 
traced back to different effects of the mutations on the intrinsic helix stability 
versus tertiary stability. Hence, this model appears to give simple physically 
consistent structural explanations for experimentally measured <3>-values. 

4 In our previous article [53], we had erroneously stated that nonclassical ^-values 
can arise for mutations that only shift the free energy of the denatured state, but 
not the free energy of the transition state and native state. This is not the case. 
Indeed, the $-value for these hypothetical mutations is 1 since AG^ = AG^. 



9 



References 



[I] Itzhaki, L. S., Otzen, D. E., & Fersht, A. R. (1995). The structure of transition 
state for folding of chymotrypsin inhibitor 2 analysed by protein engineering 
methods: Evidence for a nucleation-condensation mechanism for protein folding. 
J. Mol. Biol. 254, 260-288. 

[2] Villegas, V., Martinez, J. C, Aviles, F. X., & Serrano, L. (1998). Structure of 
the transition state in the folding process of human procarboxypeptidase A2 
activation domain. J. Mol. Biol. 283, 1027-1036. 

[3] Chiti, F., Taddei, N., White, P. M., Bucciantini, M., Magherini, F., Stefani, M, 
& Dobson, C. M. (1999). Mutational analysis of acylphosphatase suggests the 
importance of topology and contact order in protein folding. Nat. Struct. Biol. 
6, 1005-1009. 

[4] Ternstrom, T., Mayor, U., Akke, M., & Oliveberg, M. (1999). From snapshot 
to movie: <E> analysis of protein folding transition states taken one step further. 
Proc. Natl. Acad. Sci. USA 96, 14854-14859. 

[5] Fulton, K. F., Main, E. R. G., Daggett, V., & Jackson, S. E. (1999). Mapping 
the interactions present in the transition state for unfolding/folding of FKBP12. 
J. Mol. Biol. 291, 445-461. 

[6] Kim, D. E., Fisher, C, & Baker, D. (2000). A breakdown of symmetry in the 
folding transition state of protein L. J. Mol. Biol. 298, 971-984. 

[7] McCallister, E. L., Aim, E., k Baker, D. (2000). Critical role of /3-hairpin 
formation in protein G folding. Nat. Struct. Biol. 7, 669-673. 

[8] Otzen, D. E., & Oliveberg, M. (2002). Conformational plasticity in folding of 
the split j3-a-(5 protein S6: Evidence for burst-phase disruption of the native 
state. J. Mol. Biol. 317, 613-627. 

[9] Hedberg, L., & Oliveberg, M. (2004). Scattered Hammond plots reveal second 
level of site-specific information in protein folding: &((3+). Proc. Natl. Acad. 
Sci. USA 101, 7606-7611. 

[10] Anil, B., Sato, S., Cho, J. H., & Raleigh, D. P. (2005). Fine structure analysis 
of a protein folding transition state; distinguishing between hydrophobic 
stabilization and specific packing. J. Mol. Biol. 354, 693-705. 

[II] Went, H. M., & Jackson, S. E. (2005). Ubiquitin folds through a highly polarized 
transition state Protein Engineering, Design & Selection, 18, 229-237. 

[12] Wilson, C. J., & Wittung-Stafshede, P. (2005). Snapshots of a dynamic folding 
nucleus in zinc-substituted Pseudomonas aeruginosa azurin. Biochemistry 44, 
10054-10062. 



10 



[13] Kragelund, B. B., Osmark, P., Neergaard, T. B., Schiodt, J., Kristiansen, K., 
Knudsen, J., & Poulsen, F. M. (1999) The formation of a native-like structure 
containing eight conserved hydrophobic residues is rate limiting in two-state 
protein folding of ACBP. Nature Struct. Biol. 9, 594-601. 

[14] Gianni, S., Guydosh, N. R., Khan, F., Caldas, T. D., Mayor, U., White, G. W. 
N. DeMarco, M. L., Daggett, V., & Fersht, A. R. (2003). Unifying features in 
protein- folding mechanisms. Proc. Natl. Acad. Sci. USA 100, 13286-13291. 

[15] Sato, S., Religa, T. L., Daggett, V., & Fersht, A. R. (2004). Testing protein- 
folding simulations by experiment: B domain of protein A. Proc. Natl. Acad. 
Sci. USA 101, 6952-6956. 

[16] Teilum, K., Thormann, T., Caterer, N. R., Poulsen, H. I., Jensen, P. H., 
Knudsen, J., Kragelund, B. B., &: Poulsen, F. M. (2005). Different secondary 
structure elements as scaffolds for protein folding transition states of two 
homologous four-helix bundles. Proteins 59, 80-90. 

[17] Martinez, J. C., & Serrano, L. (1999). The folding transition state between SH3 
domains is conformationally restricted and evolutionarily conserved. Nature 
Struct. Biol. 6, 1010-1016. 

[18] Riddle, D. S., Grantcharova, V. P., Santiago, J. V., Aim, E., Ruczinski, I., & 
Baker, D. (1999). Experiment and theory highlight role of native state topology 
in SH3 folding. Nature Struct. Biol. 6, 1016-1024. 

[19] Hamill, S. J., Steward, A., & Clarke, J. (2000). The folding of an 
immunoglobulin-like greek key protein is defined by a common-core nucleus 
and regions constrained by topology. J. Mol. Biol. 297, 165-178. 

[20] Fowler, S.B., & Clarke, J. (2001). Mapping the folding pathway of an 
Immunoglobulin domain: Structural detail from <I> value analysis and movement 
of the transition state. Structure 9, 355-366. 

[21] Cota, E., Steward, A., Fowler, S. B., k Clarke J. (2001). The folding nucleus of a 
fibronection type III domain is composed of core residues of the immunoglobolin 
fold. J. Mol. Biol. 305, 1185-1194. 

[22] Jager, M., Nguyen, H., Crane, J.C., Kelly, J.W., & Gruebele, M. (2001). The 
folding mechanism of a /3-sheet: The WW domain. J. Mol. Biol. 311, 373-393. 

[23] Northey, J. G. B., Di Nardo, A. A., & Davidson, A. R. (2002). Hydrophobic 
core packing in the SH3 domain folding transition state. Nature Struct. Biol. 
9, 126-130. 

[24] Garcia-Mira, M. M., Bohringer, D., & Schmid, F. X. (2004). The folding 
transition state of the cold shock protein is strongly polarized. J. Mol. Biol. 
339, 555-569. 



11 



[25] Matouschek, A., Kellis, J. T., Serrano, L., & Fersht, A. R. (1989). Mapping the 
transition state and pathway of protein folding by protein engineering. Nature 
340, 122-126. 

[26] Fersht, A. R. Structure and mechanism in protein science (W. H. Freeman, New 
York, 1999) 

[27] Ozkan, S. B., Bahar, I., & Dill, K. A. (2001). Transition states and the meaning 
of ^-values in protein folding kinetics. Nature Struct. Biol. 8, 765-769 

[28] Zarrine-Afsar, A., & Davidson, A. R. (2004). The analysis of protein folding 
kinetic data produced in protein engineering experiments. Methods 34, 41-50. 

[29] Chang, I., Cieplak, M., Banavar, J. R., & Maritan, A. (2004). What can one 
learn from experiments about the elusive transition state? Protein Sci. 13, 
2446-2457. 

[30] Raleigh, D. P., & Plaxco, K. W. (2005). The protein folding transition state: 
what are <3?-values really telling us? Protein and Peptide Letters 12, 117-122. 

[31] Li, A., & Daggett, V. (1994). Characterization of the transition state of protein 
unfolding by use of molecular dynamics: Chymotrypsin inhibitor 2. Proc. Natl. 
Acad. Sci. USA 91, 10430-10434. 

[32] Li A, Daggett V. (1996). Identification and characterization of the unfolding 
transition state of chymotrypsin inhibitor 2 by molecular dynamics simulations. 
J. Mol. Biol. 257, 412-429. 

[33] Lazaridis, T., & Karplus, M. (1997). "New View" of protein folding reconciled 
with the old through multiple unfolding simulations. Science 278, 1928-1931. 

[34] Vendruscolo, M., Paci, E., Dobson, C. M., & Karplus, M. (2001). Three key 
residues form a critical contact network in a protein folding transition state. 
Nature 409, 641-645. 

[35] Li, L., Shakhnovich, E. I. (2001). Constructing, verifying, and dissecting the 
folding transition state of chymotrypsin inhibitor 2 with all-atom simulations. 
Proc. Natl. Acad. Sci. USA 98, 13014-13018. 

[36] Gsponer, J., & Caflisch, A. (2002). Molecular dynamics simulations of protein 
folding from the transition state. Proc. Natl. Acad. Sci. USA 99, 6719-6724. 

[37] Paci, E., Vendruscolo, M., Dobson, C. M., & Karplus, M. (2002). Determination 
of a transition state at atomic resolution from protein engineering data. J. Mol. 
Biol. 324, 151-163. 

[38] Guo, W., Lampoudi, S., & Shea, J.-E. (2003). Posttransition state desolvation 
of the hydrophobic core of the src-SH3 protein domain. Biophys. J. 85, 61-69. 



12 



[39] Settanni, G., Gsponer, J., k Caflisch, A. (2004). Formation of the folding 
nucleus of an SH3 domain investigated by loosely coupled molecular dynamics 
simulations. Biophys. J. 86, 1691-1701. 

[40] Paci, E., Lindorff-Larsen, K., Dobson, C. M., Karplus, M., k Vendruscolo, M. 
(2005). Transition state contact orders correlate with protein folding rates. J. 
Mol. Biol. 352, 495-500. 

[41] Salvatella, X., Dobson, C. M., Fersht, A. R., k Vendruscolo, M. (2005). 
Determination of the folding transition states of barnase by using ^-value- 
restrained simulations validated by double mutant $/j-values. Proc. Natl. Acad. 
Sci. USA 102, 12389-12394. 

[42] Chong, L. T., Snow, C. D., Rhee, Y. M., k Pande, V. S. (2005). Dimerization of 
the p53 oligomerization domain: Identification of a folding nucleus by molecular 
dynamics simulations. J. Mol. Biol. 345, 869-878. 

[43] Hubner, I. A., Edmonds, K. A., k Shakhnovich, E. I. (2005). Nucleation and 
the transition state of the SH3 domain. J. Mol. Biol. 349, 424-434. 

[44] Duan, J.X., k Nilsson, L. (2005). Thermal unfolding simulations of a multimeric 
protein - transition state and unfolding pathways. Proteins 59, 170-182. 

[45] Daggett, V., Li, A., Itzhaki, L. S., Otzen, D. E., k Fersht, A. R. (1996). 
Structure of the transition state for folding of a protein derived from experiment 
and simulation. J. Mol. Biol. 257, 430-440. 

[46] Day, R., k Daggett, V. (2005). Sensitivity of the folding/unfolding transition 
state ensemble of chymotrypsin inhibitor 2 to changes in temperature and 
solvent. Protein Sci. 14,1242-1252. 

[47] Settanni, G., Rao, F., k Caflisch, A. (2005). <E>-value analysis by molecular 
dynamics simulations of reversible folding. Proc. Natl. Acad. Sci. USA 102, 
628-633. 

[48] Lindorff-Larsen, K., Paci, E., Serrano, L., Dobson, C. M., k Vendruscolo, M. 
(2003). Calculation of mutational free energy changes in transition states for 
protein folding. Biophys. J. 85, 1207-1214. 

[49] Goldenberg, D.P. Finding the right fold. Nature Struct. Biol. 6, 987-990 (1999). 

[50] de los Rios, M. A., Daneshi, M., k Plaxco, K. W. (2005). Experimental 
investigation of the frequency and substitution dependence of negative ^-values 
in two-state proteins. Biochemistry 44, 12160-12167. 

[51] Schuler, B., Lipman, E., k Eaton, W. A. (2002). Probing the free-energy surface 
for protein folding with single molecule fluorescence spectroscopy. Nature 419, 
743-747. 



13 



[52] Akmal, A., & Murioz, V. (2004). The nature of the free energy barriers to 
two-state folding. Proteins 57, 142-152. 

[53] Merlo, C, Dill, K. A., & Weikl, T. R. (2005). $ values in protein-folding kinetics 
have energetic and structural components. Proc. Natl. Acad. Sci. USA 102, 
10171-10175. 

[54] Muhoz, V., & Serrano, L. (1994). Elucidating the folding problem of a-helical 
peptides using empirical parameters, II. Helix macrodipole effects and rational 
modification of the helical content of natural peptides. J. Mol. Biol. 245, 275- 
296. 

[55] Muhoz, V., & Serrano, L. (1994). Elucidating the folding problem of a-helical 
peptides using empirical parameters III: Temperature and pH dependence. J. 
Mol. Biol. 245, 297-308. 

[56] Lacroix, E., Viguera, A. R., & Serrano, L. (1998). Elucidating the folding 
problem of a-helices: Local motifs, long-range electrostatics, ionic strength 
dependence and prediction of NMR parameters. J. Mol. Biol. 284, 173-191. 

[57] Pace, C. N., & Scholtz, J. M. (1998). A helix propensity scale based on 
experimental studies of peptides and proteins. Biophys. J. 75, 422-427. 

[58] Serrano, L., Matouschek, A. & Fersht, A.R. (1992). The folding of an enzyme: 
III. Structure of the transition state for unfolding of barnase analysed by a 
protein engineering procedure J. Mol. Biol. 224, 805-818. 

[59] Jemth, P., Day, R., Gianni, S., Khan, F., Allen, M., Daggett, V.,& Fersht, A. R. 
(2005). The structure of the major transition state for folding of an FF domain 
from experiment and simulation. J. Mol. Biol. 350, 363-378. 

[60] Zhou, Z., Huang, Y. Z., k Bai, Y. W. (2005). An on-pathway hidden 
intermediate and the early rate-limiting transition state of Rd-apocytochrome 
b(562) characterized by protein engineering. J. Mol. Biol. 352, 757-764. 

[61] Scott, K. A., Randies, L. G, & Clarke, J. (2004). The folding of spectrin domains 
II: lvalue analysis of R16. J. Mol. Biol. 344, 207-221. 

[62] Sanchez, I. E., Sz Kiefhaber, T. (2003). Origin of unusual ^-values in protein 
folding: evidence against specific nucleation sites. J. Mol. Biol. 334, 1077-1085. 

[63] Fersht, A. R., & Sato, S. (2004). <£-value analysis and the nature of protein- 
folding transition states. Proc. Natl. Acad. Sci. USA 91, 10422-10425. 

[64] De los Rios, M., Muralidhara, B. K., Wildes, D., Sosnick, T. R., Marqusee, S., 
Wittung-Stafshede, P., Plaxco, K. W., & Ruczinski, I. (2006). On the precision 
of experimentally determined protein folding rates and ^-values. Protein Sci. 
15, 553-563. 



14 



Table 1: Helix of the protein CI2 



mutation $ AG N AG% GAmR AG Pro P 



S12G 


0.29 


0.8 


0.28 




S12A 


0.43 


0.89 


0.14 




E15D 


0.22 


0.74 


0.13 


0.29 


E15N 


0.53 


1.07 


0.57 


0.25 


A16G 


1.06 


1.09 


0.82 


1.0 


K17G 


0.38 


2.32 


0.80 


0.74 


K18G 


0.7 


0.99 


0.75 


0.74 


I20V 


0.4 


1.3 


0.14 


0.2 


L21A 


0.25 


1.33 


-0.01 


-0.21 


L21G 


0.35 


1.38 


0.26 


0.79 


D23A 


-0.25 


0.96 


-0.41 




K24G 


0.1 


3.19 


0.12 





Experimental ^-values and stability changes AGjv are from Itzhaki et al.[l]. The 
change in intrinsic helix stability AG AGADIR is calculated with AGADIR [54,55,56], 
see Merlo et al. [53]. The change in intrinsic helix stability AG^ op is calculated 
from the helix propensity scale of Pace and Scholtz [57]. The helix propensities of 
the residues are (in kcal/mol): Ala (A) 0, Leu (L) 0.21, Arg (R) 0.21, Met (M) 0.24, 
Lys (K) 0.26, Gin (Q) 0.39, Glu (E) 0.40, He (I) 0.41, Trp (W) 0.49, Ser (S) 0.50, 
Tyr (Y) 0.53, Phe (F) 0.54, Val (V) 0.61, His (H) 0.61, Asn (N) 0.65, Thr (T) 0.66, 
Cys (C) 0.68, Asp (D) 0.69, and Gly (G) 1. For the terminal residues 12, 13, 23, and 
24 of the helix, the propensity scale is not applicable. We only consider mutations 
with AG N > 0.7 kcal/mol. 
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Table 2: Helix 2 of protein A 



mutation 




AG N 


AG a 


tertiary contacts 


A27G 


1.0 


1.0 


1.0 




A28G 


0.6 


2.2 


1.0 


Helix 1 


A29G 


1.1 


1.0 


1.0 




F31A 


0.3 


3.9 


-0.54 


Helices 1, 3 


F31G 


0.5 


4.7 


0.46 


Helices 1, 3 


I32V 


0.6 


1.2 


0.2 


Helix 1 


I32A 


0.5 


1.9 


-0.41 


Helix 1 


I32G 


0.6 


3.4 


0.59 


Helix 1 


A33G 


1.1 


0.9 


1.0 




A34G 


0.7 


1.2 


1.0 




L35A 


0.4 


2.4 


-0.21 


Helices 1, 3 


L35G 


0.5 


4.1 


0.79 


Helices 1, 3 



Experimental ^-values and stability changes AGn are from Sato et al. [15]. The 
change in intrinsic helix stability AG a is calculated from the helix propensity scale 
of Pace and Scholtz [57]. The information whether tertiary contacts with helix 1 
and 3 are affected by the mutations is taken from the contact matrix of protein A 
shown in Fig. 2. We only consider ^-values for single-residue mutations with the 
wildtype sequence as reference state at those sites where multiple mutations have 
been performed. For example, we consider the ^-values for the mutations I32V, 
I32A, and I32G in helix 2 of protein A, but not the ^-values for V32A and A32G 
also given by Sato et al. [15]. However, we include the ^-values for the Ala-Gly 
scanning mutants at the residue positions 27, 28, 29, 33, and 34 given in Table 1 of 
Sato et al. [15]. 
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Table 3: Helix 3 of protein A 



mutation $ AGn AG a tertiary contacts 



A44G 


-0.1 


1.3 


1.0 




L45A 


0.6 


1.5 


-0.21 


Helix 2 


L45G 


0.3 


4.4 


0.79 


Helix 2 


L46A 


0.2 


1.9 


-0.21 


Helix 1 


L46G 


0.3 


4.0 


0.79 


Helix 1 


A47G 


0.2 


1.5 


1.0 




A48G 


0.0 


1.8 


1.0 


Helix 2 


A49G 


0.2 


3.6 


1.0 


Helix 2 


A51G 


0.1 


1.2 


1.0 




L52A 


0.3 


1.3 


-0.21 


Helix 2 


L52G 


0.1 


3.8 


0.79 


Helix 2 


A54G 


0.0 


1.4 


1.0 





Experimental ^-values and stability changes AGn are from Sato et al. [15]. The 
change in intrinsic helix stability AG a is calculated from helix propensities [57]. 
The information on tertiary contacts is taken from Fig. 2 
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Table 4: Helix of protein L 



mutation 




AG N 


AG a 


tertiary contacts 


A29G 


0.23 


2.41 


1.0 


Hairpin 1 


T30A 


0.08 


1.31 


-0.66 


Hairpin 1 


S31G 


0.11 


0.81 


0.5 




E32G 


0.11 


1.08 


0.6 


Hairpin 1 


E32I 


0.05 


1.25 


0.01 


Hairpin 1 


A33G 


0.25 


2.85 


1.0 


Hairpin 1, 2 


Y34A 


0.05 


2.57 


-0.53 


Hairpin 2 


A35G 


0.28 


1.2 


1.0 




Y36A 


0.27 


2.54 


-0.53 


Hairpin 1 


A37G 


0.11 


3.14 


1.0 


Hairpin 2 


D38A 


-0.39 


0.98 


-0.69 


Hairpin 2 


D38G 


-0.05 


1.89 


0.31 


Hairpin 2 



Experimental ^-values and stability changes AGn are from Kim et al. [6]. The 
change in intrinsic helix stability AG a is calculated from helix propensities [57]. The 
two /3-hairpins of protein L are defined in the caption of Fig. 3. The information on 
tertiary interactions of helical residues with the hairpins is taken from this figure. 
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Table 5: Structural parameters, standard deviations, and correlation coefficients 



helix 


tertiary contacts 


Xa Xt 


SD \r\ 


CI2 helix 


all 


1.03 0.16 


0.14 0.91 


helix 2 of protein A 


all 

with helix 1 


0.98 0.46 
0.98 0.52 


0.10 0.93 
0.12 0.90 


helix 3 of protein A 


all 

with helix 1 
with helix 2 


-0.07 0.31 
-0.01 0.24 
-0.09 0.34 


0.13 0.75 
0.13 0.65 
0.13 0.79 


helix of protein L 


all 

with hairpin 1 
with hairpin 2 


0.30 0.06 
0.21 0.15 
0.32 -0.04 


0.15 0.63 
0.10 (0.30) a 
0.11 0.90 



The structural parameters Xa and xt, estimated standard deviations SD of the data 
points from the regression lines, and absolute values of the correlation coefficient r 
obtained in our model. The second column of the table indicates whether we con- 
sider all mutations for a helix, or only mutations affecting tertiary interactions with 
one structural element. The structural parameter xt then either indicates the overall 
degree of tertiary structure formation in the transition state, or the degree of ter- 
tiary structure formation with the given structural element. In both cases, we have 
included the 'purely secondary' mutations that do not affect tertiary interactions. 
The structural elements of protein A and L are defined in the Figs. 2 and 3. The 

standard deviation SD is estimated as SD = J {Y^=i df) I ~~ 2) where di is the 
vertical deviation of data point i from the regression line, and M is the number of 
data points. We estimate the errors in the structural parameters Xa and Xt, which 
result from experimental and modeling errors, as ±0.05 for the CI2 helix and helix 
2 of protein A, and as ±0.1 for helix 3 of protein A and the protein L helix. 
a For this data set, the correlation coefficient r is not a reasonable indicator of the 
modeling quality since the slope of the regression line is close to 0. The precise value 
of r is then dominated by the experimental errors in $. In our model, the slope of 
the regression line close to indicates that the two structural parameters Xa an d 
Xt have similar values, see eq. 3. The relatively small standard deviation SD of 0.10 
for this data set shows that our model is in good agreement with the data. 
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Fig. 1. In our model, the transition-state ensemble T consists of M transition-state 
conformations Ti, T2, . . ., Tm- The arrows indicate the folding direction from the 
denatured state D to the native state N via the transition-state conformations. 
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10 20 30 40 50 60 




Fig. 2. Contact matrix of protein A. A black dot at position of the matrix 

indicates that the two non-neighboring residues i and j are in contact in the native 
structure (protein data bank file 1SS1, model 1). Two residues here are defined to 
be in contact is the distance between any of their non-hydrogen atoms is smaller 
than the cutoff distance 4 A. Protein A is an a-helical protein with three helices. 
Helix 1 consists of the residues 10 to 19, helix 2 of the residues 25 to 37, and helix 
3 of the residues 42 to 56. 
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10 20 30 40 50 60 




Fig. 3. Contact matrix of protein L (protein data bank file 1HZ6, residues Al to 
A62). The structure of protein L consists of two /3-hairpins at the termini, and an 
a-helix in between. The helix consists of the residues 26 to 40. The hairpin 1 at 
the N-terminus includes the residues 4 to 24, and the hairpin 2 at the C-terminus 
includes the residues 47 to 62. 
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Helix of CI2 




- 0.4 - 0.2 0.2 0.4 0.6 0.8 1 



AG a /AG N 

Fig. 4. Analysis of $-values for mutations in the helix of the protein CI2. The 
change in intrinsic helix stability AG a for the 12 mutations has been calculated with 
AGADIR (see Table 1). We only consider mutations with experimentally measured 
stability changes AGn > 0.7 kcal/mol. The Pearson correlation coefficient of the 12 
data points is 0.91. From the regression line <I> = 0.16 + 0.87AG a /AGN, we obtain 
the structural parameters Xa = 1-03 ± 0.05 and \t = 0.16 ± 0.05. The structural 
parameter \a close to 1 indicates that the helix is fully formed in the transition 
state. The parameter xt indicates that tertiary interactions are on average present 
in the transition state to a degree around 16 % 
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Helix 2 of protein A 




- 0.2 0.2 0.4 0.6 0.8 1 



AG a /AG N 

Fig. 5. Analysis of ^-values for helix 2 of protein A. The solid line represents 
the regression line = 0.46 + 0.52 AGq/AGat for all points. The correlation 
coefficient of the data points is 0.93. The dashed line is the regression line 
<J> = 0.52 + 0.46 AGa/AG^ of the 8 data points for mutations of residues that 
have either no tertiary interactions or tertiary interactions with helix 1 (see also 
Table 2). The correlation coefficient of these data points is 0.90. From the regres- 
sion lines and eq. (3), we obtain the structural parameters Xa and xt shown in Table 
5. The values of Xa close to 1 indicate that the helix is fully formed in the transition 
state, and the values of xt close to 0.5 indicate that tertiary interactions are present 
to a degree of about 50 %. 
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Helix 3 of protein A 



<J> 0.6 



- • 




Tertiary contacts: 






♦ none 






■ with helix 1 






• with helix 2 








- ■ 




♦ 




• 


















1 


, , , .♦ , 



- 0.2 0.2 0.4 0.6 0.8 



AG a /AG N 



Fig. 6. Analysis of <I>-values for mutations in helix 3 of protein A. The solid line rep- 
resents the regression line <3? = 0.31 — 0.38AG q /AGat of all data points; the dashed 
line is the regression line = 0.24 — 0.25AG q /AGat of the data points for mutations 
that affect the tertiary interactions with helix 1 (or no tertiary interactions); and 
the dotted line is the regression line <E> = 0.34 — 0A3AG a /AGN of data points for 
mutations that affect tertiary interactions interactions with helix 2 or no tertiary 
interactions) . The absolute values of the correlation coefficient for these three data 
sets are \r\ = 0.75, 0.65, and 0.79, respectively (see Table 5). 
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Helix of protein L 



<*> 0.3 




O 




0.2 
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- • 


- 0.1 








- 0.2 






Tertiary contacts: 
♦ none 


- 0.3 


— y 




• with hairpin 2 






o with hairpin 1 


- 0.4 


_ • 




□ with hairpins 1 and 2 

i i i i i i i i i i i i i i i i 



- 0.6 - 0.4 - 0.2 0.2 0.4 0.6 0.8 



AG a /AG N 

Fig. 7. Analysis of <I>-values for mutations in the helix of protein L. The solid line 
represents the regression line = 0.06 + 0.24 AG a / AGn for all data points; the 
dotted line is the regression line 3> = 0.15 + 0.06 AG a / AGn of the 7 data points 
for mutations that affect tertiary interactions with hairpin 1 or none of the ter- 
tiary interactions (see also Table 4); and the dashed line is the regression line 
<5 = —0.04 + 0.37 AG a / AGn of the 6 data points for mutations affecting tertiary 
interactions with hairpin 2 or none of the tertiary interactions. 
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